Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Chinese character association measurement method and its application on Chinese text similarity analysis
Zhao Yanbin
Journal of Computer Applications   
Abstract1810)      PDF (499KB)(925)       Save
The research of text similarity analysis and text clustering is mostly based on feature words. Because Chinese text does not have a natural delimiter between words, it must solve two problems such as Chinese word segmentation and higher-level dimensions feature vector spaces. In order to reduce the higher complexity, a novel investigation method of text similarity analysis using the association of Chinese characters was probed without using feature words. The notation of Chinese Character Association Measurement was defined, and the Chinese Character Association Measurement matrix to represent the Chinese text documents was constructed. Then a Chinese text similarity algorithm based on Chinese Character Association Measurement Matrix is proposed. The experiment result shows the Chinese Character Association Measurement is better than the mutual information and the T test and the bi-gram frequency. Without Chinese word segmentation, so this algorithm is useful in massive Chinese data corpus.
Related Articles | Metrics